Skip to content

feat(dwarf): DwarfHandling::Remap end-to-end (#143 DWARF Phase 2 inc 3b)#206

Merged
avrabe merged 1 commit into
mainfrom
feat/dwarf-remap-inc3b
May 29, 2026
Merged

feat(dwarf): DwarfHandling::Remap end-to-end (#143 DWARF Phase 2 inc 3b)#206
avrabe merged 1 commit into
mainfrom
feat/dwarf-remap-inc3b

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 29, 2026

Summary

The final piece of DWARF Phase 2 (#143): DwarfHandling::Remap turns the AddressRemap engine (v0.18.0) into a working, debug-info-preserving fusion path. It reads an input core module's .debug_* sections, translates every code address to the fused code section, and re-serializes a single remapped DWARF set via gimli::write::Dwarf::from. Exposed as meld fuse --dwarf remap.

This completes the increment arc:

Inc Release Anchor
1 v0.16.0 per-function base (component-provenance v2 code_range)
2 v0.17.0 intra-function InstrOffsetMap (LEB drift)
3a v0.18.0 AddressRemap engine (composes 1+2)
3b this PR gimli .debug_* rewrite + DwarfHandling::Remap

Design — three decisions that de-risk it

  1. Post-hoc parallel operator walk. The instruction offset map is recovered by walking the input and final-output operator streams in lockstep, not captured during the merge. The merge re-rewrites bodies after adapter wiring (lib.rs:694), so a captured map would go stale; the post-hoc walk reflects whatever rewriting actually happened and threads no state through the hot path. A per-function operator-count or locals-prefix mismatch (e.g. memory-rebasing inserted scratch ops) aborts the remap.

  2. Correct-or-strip. gimli::write::Dwarf::from is all-or-nothing on addresses — a None from convert_address aborts the whole conversion. That's used as the safety gate: only the structurally-invariant code-section base (address 0) is special-cased; any other unmapped address fails the conversion and falls back to stripping. Never emit a wrong address.

  3. Single-DWARF-source scope. DWARF is per-input-core-module but the output has one code section. Merging N independent DWARF unit sets is a separate fidelity problem — deferred. Exactly one DWARF-bearing source → full remap; more than one → strip + warn; zero → no-op. The common single-component case (main module carries DWARF, hand-written adapter modules don't) is fully served.

A three-pass encode keeps the remapped .debug_* inside the attestation/provenance-hashed bytes (trailing custom sections don't shift code offsets, so the remap built from pass A is valid for the final output).

Verification

  • LS-D-1 (new, approved): wrong remapped DWARF address → de-grounded downstream coverage/breakpoints. Gated by dwarf::tests::ls_d_1_remap_translates_low_pc — a full gimli read→convert→write→read oracle that builds real input DWARF, remaps a subprogram's low_pc, re-parses the output DWARF, and asserts the address was actually translated.
  • Parallel-walk unit tests: identity round-trip from real wasm bytes + operator-count-mismatch abort.
  • Multi-source strip-fallback integration test on lists.wasm.
  • The six translate_* unit tests (3a) pin the remap math.
  • tools/run_ls_verification.py: [ OK ] LS-D-1 (1 pass).

Residual (documented in LS-D-1): DW_AT_high_pc encoded as a length (DW_FORM_data*, the common Rust/LLVM encoding) is copied verbatim — gimli treats it as a constant, not an address — so a function's reported byte length may be off by intra-function LEB drift. low_pc and the line-number program (what debuggers and pulseengine/witness use) are correct.

Notes

  • Adds the gimli dependency.
  • dwarf.rs Tier-5 registration is a separate follow-up PR (the claude-code-action byte-identical-workflow constraint forbids bundling a mythos-auto.yml edit with code changes).

🤖 Generated with Claude Code

Wire the v0.18.0 AddressRemap engine into a real `.debug_*` rewrite.
`DwarfHandling::Remap` reads an input core module's DWARF, translates
every code address to the fused code section, and re-serializes a
single remapped set via `gimli::write::Dwarf::from`. Exposed as
`meld fuse --dwarf remap`.

Design (de-risked):
- Recover the per-function instruction offset map POST-HOC by walking
  the input and final-output operator streams in lockstep, rather than
  capturing during the merge — so it reflects the adapter-wiring
  re-rewrite and threads no state through the hot path. A per-function
  operator-count / locals-prefix mismatch aborts the remap.
- Correct-or-strip: `gimli::write::Dwarf::from` is all-or-nothing on
  addresses, used as the safety gate. Only the code-section base
  (address 0) is special-cased; any other unmapped address fails the
  conversion and falls back to stripping — never a wrong address.
- Single DWARF source supported; multi-source inputs strip with a
  warning (merging independent unit sets deferred). Zero sources is a
  no-op.
- Three-pass encode so remapped `.debug_*` land in the attestation/
  provenance-hashed bytes (trailing custom sections don't shift code
  offsets).

Verification:
- New LS-D-1 (approved): wrong remapped address -> de-grounded
  downstream coverage/breakpoints. Gated by
  `dwarf::tests::ls_d_1_remap_translates_low_pc` -- a full gimli
  read->convert->write->read oracle asserting low_pc is actually
  translated. Plus the parallel-walk unit tests (identity + abort) and
  the multi-source strip-fallback integration test on lists.wasm.
- Residual: `DW_AT_high_pc` as a length is copied verbatim (may be off
  by intra-function LEB drift); low_pc + line program are correct.

Adds the `gimli` dependency. dwarf.rs Tier-5 registration follows in a
separate workflow PR (byte-identical-workflow constraint).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

LS-N verification gate

⚠️ 36/38 verified — 2 missing regression tests

count
Passed (≥1 test, all green) 36
Failed (≥1 test failure) 0
Missing (no ls_*_NN_* test found) 2

Approved loss-scenarios.yaml entries are expected to have a
regression test named ls_<letter>_<num>_* (e.g. LS-A-11
ls_a_11_*). The gate runs each prefix via cargo test --lib --no-fail-fast and aggregates pass/fail/missing.

Failed LS entries

(none)

Missing regression tests
  • LS-R-13
  • LS-M-6

Updated automatically by tools/post_verification_comment.py.
Source of truth: safety/stpa/loss-scenarios.yaml.

avrabe added a commit that referenced this pull request May 29, 2026
`meld-core/src/dwarf.rs` (the DWARF AddressRemap engine + the
`DwarfHandling::Remap` rewrite, #143) is correctness-critical: a wrong
remapped code address silently de-grounds downstream coverage and
breakpoints (LS-D-1). Add it to the Mythos auto-scan Tier-5 file list so
future diffs get the clean-room AI delta pass.

Standalone workflow-only change: the claude-code-action identity check
requires the workflow file be byte-identical to main, so this cannot be
bundled with the inc 3b code PR (#206).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@avrabe avrabe merged commit b028e43 into main May 29, 2026
14 checks passed
@avrabe avrabe deleted the feat/dwarf-remap-inc3b branch May 29, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant